A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning
نویسندگان
چکیده
A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs. This article reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approximate dynamic programming methods such as trajectory based value iteration, and finally moves to reinforcement learning methods such as Q-Learning, SARSA, and least-squares policy iteration. We describe algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each. Empirical evaluations of these techniques with four representations across four domains, provide insight into how these algorithms perform with various feature sets in terms of running time and performance. A. Geramifard, T. J. Walsh, S. Tellex, G. Chowdhary, N. Roy, and J. P. How. A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning. Foundations and Trends R © in Machine Learning, vol. 6, no. 4, pp. 375–451, 2013. DOI: 10.1561/2200000042.
منابع مشابه
Reinforcement Learning Applied to Linear Quadratic Regulation
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for proble...
متن کاملCompetitive Function Approximation for Reinforcement Learning IRI Technical Report
The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programmin...
متن کاملStable Function Approximation in Dynamic Programming
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that gen...
متن کاملHigh-accuracy value-function approximation with neural networks applied to the acrobot
Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy ...
متن کاملGeneralization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding 1 Reinforcement Learning and Function Approximation 2 Good Convergence on Control Problems
On large problems, reinforcement learning systems must use parame-terized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Foundations and Trends in Machine Learning
دوره 6 شماره
صفحات -
تاریخ انتشار 2013